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Abstract 

Monotonic convergence is established for a general class of multiplicative algorithms in- 
troduced by Silvey et al. (1978) for computing optimal designs. A conjecture of Titterington 
(1978) is confirmed as a consequence. Optimal designs for logistic regression are used as an 
illustration. 
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1 A general class of algorithms 

Optimal experimental design (approximate theory) is a well-developed area and we refer to 
Kiefer (1974), Silvey (1980), Pazman (1986), and Pukelsheim (1993) for general introduction 
and basic results. We consider computational aspects of optimal designs, focusing on a finite 
design space X = {x±, . . . , x n }. Suppose the probability density or mass function of the response 
is specified as p(y\x, 9), where 9 = (9\, . . . , 9 m ) T is the parameter of interest. Let Ai denote the 
mx m expected Fisher information matrix from a unit assigned to Xi, with the (j, k) entry (the 
expectation is with respect to p(y\xi,8)) 



Mj,k) = E 



d log p(y\xi, 9) dlogp(y\xi,9) 



d9j 89 k 

The moment matrix, as a function of the design measure w = (w\, . . . , uu n ), is defined as 

n 

i=i 

which is proportional to the Fisher information for 9 when the number of units assigned to X{ is 
proportional to it?j. Here w 6 Ct, and denotes the closure of f2 = {w : W{ > 0, ^r=i w i = 1} ■ 
Throughout we assume that A; are well-defined and hence nonnegative definite. The set 

17 + = {w € : M(w) > (positive definite)} 

is assumed nonempty. Our approach may conceivably extend to the case where M(w) is allowed 
to be singular, by using generalized inverses, although we do not pursue this here. 
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Given an optimality criterion <j), denned on positive definite matrices, the goal is to maximize 
(j)(M(w)) with respect to w G f2+. Typical optimality criteria include 

(i) the D-criterion 4> (M) = logdet(M), 

(ii) the A-criterion 0_i(M) = —tr^M^ 1 ), 

(iii) more generally, the pth mean criterion (fr p (M) = —tr(M p ), p < 0, and 

(iv) the c-criterion <^>_i )C (M) = — c T M _1 c, where c is a nonzero constant vector. 

Often only a linear combination K T 9, e.g., a subvector of 9, is of interest. The Fisher information 
for K T 9 is naturally defined as (K T M~ l K)~ 1 , assuming invertibility (Pukelsheim, 1993). We 
may therefore consider the D- and A-criteria for K T 9 defined respectively as 

4) 0)K {M) = -logdet^M" 1 ^); 
(p- 1<K (M) = -tr{K T M'^K). (1) 

The c-criterion is a special case of 0_i k{M). Motivations for such optimality criteria are well- 
known. In a linear problem, the A-criterion seeks to minimize the sum of variances of the 
best linear unbiased estimators (BLUEs) for all coordinates of 9, while the c-criterion seeks to 
minimize the variance of the BLUE for c T 9. Similar interpretations (with asymptotic arguments) 
apply to nonlinear problems. 

In general M(w) also depends on the unknown parameter 9, which complicates the definition 
of an optimality criterion. A simple solution is to maximize 4>{M(w)) with 9 fixed at a prior 
guess 9*; this leads to local optimality (Chernoff 1953). Local optimality may be criticized for 
ignoring uncertainty in 9. However, in a situation where real prior information is available, or 
where the dependence of M on 9 is weak, it is nevertheless a viable approach, and has been 
adopted routinely (see, for example, Li and Majumdar 2008). Henceforth we assume a fixed 9* 
and suppress the dependence of M on 9. Possible extensions are mentioned in Section 5. 

Optimal designs do not usually come in closed form. As early as Wynn (1972), Fedorov 
(1972), Atwood (1973), and Wu and Wynn (1978), and as late as Torsney (2007), Harman and 
Pronzato (2007), and Dette et al. (2008), various procedures have been studied for numerical 
computation. We shall focus on the following multiplicative algorithm (Titterington 1976, 1978; 
Silvey et al. 1978), which is specified through a power parameter A G (0, 1]. 

Algorithm I Set A G (0, 1] and G f2. For t = 0, 1, . . ., compute 

...<*•!)-...<«) d U^) = ( 2) 



d<p(M) 



where 

d i (w) = tr(<f>'(M(w))A l ), cj>'{M) _ 

Iterate until convergence. 

For a heuristic explanation, observe that ([2]) is equivalent to 



(t+i) (t) / d(j)(M(w)) 



V 



«(*) 



A 

, i = l,...,n. (3) 



The value of d(j)(M(w)) / 'dw% indicates the amount of gain in information, as measured by (j>, by 
a slight increase in Wi, the weight on the ith design point. So (|3|) can be seen as adjusting w so 
that relatively more weight is placed on design points whose increased weight may result in a 
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larger gain in (p. If (p is increasing and concave, then a convenient convergence criterion, based 
on the general equivalence theorem (Kiefer and Wolfowitz, 1960; Whittle, 1973), is 



Ki<n V 



max di (w^) < (1 + 




(4) 



where d(w) = ^ 



n 



Widi{w) and 5 is a small positive constant. 



Algorithm I is remarkable in its generality. For example, little restriction is placed on the 
underlying model p(y\x, 6). Part of the reason, of course, is that we focus on Fisher information 
and local optimality, which essentially reduces the problem to a linear one. 

There exists a large literature on Algorithm I and its relatives; see, for example, Titterington 
(1976, 1978), Silvey et al. (1978), Pazman (1986), Fellman (1989), Pukelsheim and Torsney 
(1991), Torsney and Mandal (2006), Harman and Pronzato (2007), Dette et al. (2008), and 
Torsney and Martfn-Martfn (2009). One feature that has attracted much attention is that 
Algorithm I appears to be monotonic, i.e., cp(M(w^)) increases in t, at least in some special 
cases. For example, when = <p>Q (for D-optimality) and A = 1, Titterington (1976) and 
Pazman (1986) have shown monotonicity using clever probabilistic and analytic inequalities; 
see also Dette et al. (2008) and Harman and Trnovska (2009). Algorithm I is also known to be 
monotonic for (p = <p-i t K as in ([1]), assuming A = 1/2 and Ai are rank-one (Fellman 1974; Torsney 
1983). Monotonicity is important because convergence then holds under mild assumptions (see 
Section 4). Results in these special cases suggest a monotonic convergence theory for a broad 
class of 4>, which is also supported by numerical evidence presented in some of the references 
above. 

2 Main result 

We aim to state general conditions on <p that ensure that Algorithm I converges monotonically. 
As a consequence certain known theoretical results are unified and generalized, and one particular 
conjecture (Titterington 1978) is confirmed. Define 



The functions <p and ip are assumed to be differentiable on invertible matrices. Our conditions 
are conveniently stated in terms of ip. As usual, for two symmetric matrices, Mi < (<)M2 
means M 2 — M\ is nonnegative (positive) definite. 

• ip(M) is increasing: 



4>(M ), M>0. 




(5) 



• ip{M) is concave: 



a^(Mi) + (1 - a)ip(M 2 ) < ^p(aM 1 + (1 - a)M 2 ) 



(6) 



for a 6 [0,1], M 1 ,M 2 >0. Equivalently, 



iP(M 2 ) < ^j{M l ) + tr{iP'{M l ){M 2 - Mi)), M U M 2 > 0. 



(7) 
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Condition ([5]) is usually satisfied by any reasonable information criterion (Pukelsheim 1993). 
Also note that, if ([5]) fails, then d<ft(M(w))/dwi on the right hand side of ([3j) is not even guar- 
anteed to be nonnegative. The real restriction is the concavity condition ©. For example, ([6]) 
is not satisfied by ift p (M) = —(ft p (M~ 1 ) (the pth mean criterion) when p < —1. (It is usually 
assumed that (ft(M), rather than ip(M), is concave.) Nevertheless, © is satisfied by a wide 
range of criteria, including the commonly used D-, A- or c-criteria (see Cases (i) and (ii) in the 
illustration of the main result below) . 
Our main result is as follows. 

Theorem 1 (General monotonicity) . Assume ^ and ([6]). Assume that in iteration |I|), with 
< A < 1, we have 

M{w^)>0, c/>'(M(w®)) / 0, and M{w it+1 ^) > 0. 

Then 

(ft(M(w^)) > cft(M(w^)). 

In other words, under mild conditions which ensure that ([2]) is well-defined (specifically, the 
denominator in (|2|) is nonzero), © and ([6]) imply that ([2]) never decreases the criterion (ft. Let 
us illustrate Theorem [T] with some examples. For simplicity, in (i)-(iv) we display formulae for 
A = 1 only, although monotonicity holds for all A G (0,1]. 

(i) Take 

logdetM, p = 0; 
-tr{MP), pG [-1,0). 

Then tp p (M) = — ^ p (M _1 ) satisfies © and ©. By Theorem [H Algorithm I is monotonic for 
(ft = (ftp, p G [—1,0]. This generalizes the previously known cases p = and p = — 1 (with 
particular values of A). The iteration ([2]) reads 

(t+ i) ^triMP-^w^A) . n 

W) =W- ; ; jw- , 1=1, ...,n. 

tr(MP(w(*))) 

(ii) More generally, given a full rank m x r matrix K (r < m), consider 

rppMM- 1 ) = -(ft p , K (M) = 



MM) 



tr((K~ T M- l K)- p ), pG [-1,0). 

Then ift Pt K(M) satisfies ([5]) and ©. By Theorem [H Algorithm I is monotonic for (ft = 4> p> Ki P G 
[—1,0]. The iteration ([2]) reads 



(t+l) = (t) tr(M^ 1 K(K T M^ l K)~ p ~ l K T M^Aj) 



M=M(w( t 1) 



(8) 



(iii) In particular, taking r = 1, K = c (an m x 1 vector) and p = — 1 in Case (ii), we obtain 
that Algorithm I is monotonic for the c-criterion 0_i c . The iteration ([8]) reduces to 

(t+ij W c T M- 1 (wW)A i M- 1 (^ (i) )c . n 

-T- 77 .... , i=l,...,n. 

As noted by a referee, with p = — 1, the choice A = 1 may lead to an oscillating behavior in 
the sense that alternates between two points at which 4>-x,c{M(w)) takes the same value. 



4 



While this does not contradict Theorem [H it suggests that other values of A are more desirable 
for fast convergence. Following Fellman (1974) and Torsney (1983), a practical recommendation 
is A = 1/2 in the p = — 1 case. 

(iv) Consider another example of Case (ii), with p = 0, r = m — 1 and K = (0 r ,/ r ) T . 
Henceforth r denotes the r x 1 vector of zeros, and I r denotes the r x r identity matrix. 
Assume Aj = XixJ , xj = (l,zj) and Zi is (m — 1) x 1. This corresponds to a D-optimal design 
problem for (82, ■ ■ ■ ,0 m ) under the linear model 

y|(z,0)~N(x T #, a 2 ), x T = (l,z T ), 

where the parameter is 6 = (61,62, ■ ■ ■ ,6 m ) T . That is, interest centers on all coefficients other 
than the intercept. Nevertheless, as far as the design measure w is concerned, the optimality 
criterion, ^^(M), coincides with <f>o(M), i.e., 

-logdet^M-V)^) = logdetM(w). 

After some algebra, (J5]) reduces to 

m — 1 

where 

i=i i=i 

Thus © satisfies det M(u>( m >) > det M(io^). 

Monotonicity of has been conjectured since Titterington (1978), and considerable nu- 
merical evidence has accumulated over the years. Recently, extending the arguments of Pazman 
(1986), Dette et al. (2008) have obtained results which come very close to resolving Tittering- 
ton's conjecture. Nevertheless, we have been unable to extend their arguments further. Instead 
we prove the general Theorem [1] using a different approach, and settle this conjecture as a 
consequence. 

The proof of Theorem[T]is achieved by using a method of auxiliary variables. When a function 
f(w) (e.g., — det M(w)) to be minimized is complicated, we introduce a new variable Q and a 
function g(w,Q) such that ming g(w, Q) = f(w) for all w, thus transforming the problem into 
minimizing g(w,Q) over w and Q jointly. Then we may use an iterative conditional minimization 
strategy on g(w, Q). This is inspired by the EM algorithm (Dempster et al. 1977; Meng and van 
Dyk 1997); in particular, see Csiszar and Tusnady's (1984) interpretation (see Yu (2008) for a 
related interpretation of the data augmentation algorithm). 

In Section 3 we analyze Algorithm I using this strategy. Although attention is paid to the 
mathematics, our focus is on intuitively appealing interpretations, which may lead to further 
extensions of Algorithm I with the same desirable monotonicity properties. If the algorithm is 
monotonic, then convergence can be established under mild conditions (Section 4). Section 5 
contains an illustration with optimal designs for a simple logistic regression model. 

3 Explaining the monotonicity 

A key observation is that the problem of maximizing (f)(M(w)), or, equivalently, minimizing 
-0(M _1 (w)) can be formulated as a joint minimization over both the design and the estimator. 



5 



Specifically, let us compare the original Problem PI with its companion P2. Throughout A 1 / 2 
denotes the symmetric nonnegative definite (SNND) square root of an SNND matrix A. 

Problem PI: Minimize —<f>{M(w)) = ip((^22=i over w £ fl. 

Problem P2: Minimize 

g(w,Q) = ^j(QA w Q T ) (10) 



over w E f2 and Q (an m x (mn) matrix), subject to QG = I m , where 

A w = Diag^ 1 , . . . , w- 1 ) I m ; G = (A{ /2 , . . . , Al/ 2 ) T . 

Though not immediately obvious, PI and P2 are equivalent, and this may be explained in 
statistical terms as follows. In (jlOD . QA W Q T is simply the variance matrix of a linear unbiased 
estimator, QY, of the m x 1 parameter 6 in the model 

Y = G9 + e, e~N(0, A w ), 

where Y is the ( THTij x 1 vector of observations. The constraint QG — I m ensures unbiasedness. 
(Note that G is full-rank since M(w) is nonsingular by assumption.) Of course, the weighted 
least squares (WLS) estimator is the best linear unbiased estimator, having the smallest variance 
matrix (in the sense of positive definite ordering) and, by ([5]), the smallest ip for that matrix. It 
follows that, for fixed w, g(w,Q) is minimized by choosing QY as the WLS estimator: 

g(w,QwLs) = inf g(w,Q), (11) 

QG—Im 

Qwls = M-\w) (w^ 1 / 2 , . . . , w n A]l 2 ) . (12) 
However, from (jlOp and ()12|) we get 

g(w,Q WLS ) = iP(M-\w)). (13) 

That is, P2 reduces to PI upon minimizing over Q. 

Since P2 is not immediately solvable, it is natural to consider the subproblems: (i) minimizing 
g(w,Q) over Q for fixed w, and (ii) minimizing g(w,Q) over w for fixed Q. Part (ii) is again 
formulated as a joint minimization problem. For a fixed m x {ran) matrix Q such that QG = I m , 
let us consider Problems P3 and P4. 

Problem P3: Minimize g(w, Q) as in (jlOp over w G O. 

Problem P4: Minimize the function 

h(Z,w,Q) = + tr (V (S) (QA to Q T - (14) 



over w E ^ and the m x m positive-definite matrix E. 
The concavity assumption (J7|) implies that 

h(E,w, Q)>ip (QA w Q T> j (15) 

with equality when E = QA W Q T , i.e., Problem P4 reduces to P3 upon minimizing over S. 

Since P4 is not immediately solvable, it is natural to consider the subproblems: (i) minimizing 
/i(E, w, Q) over E for fixed w and Q, and (ii) minimizing h(E,w,Q) over w for fixed E and Q. 
Part (ii), which amounts to minimizing 



tr (V(£)QA„,Q T ) = tr {cf^'^QA, 
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admits a closed-form solution: if we write Q = (Qi, ■ ■ ■ , Q n ) where each Qi is m x m, then wf 
should be proportional to tr(Q] ift 1 '(E)Qj). But algorithm I may not perform an exact minimiza- 
tion here; see (fT6j) . 

Based on the above discussion, we can express Algorithm I as an iterative conditional mini- 
mization algorithm involving w, Q and X. At iteration t, define 



Q® = (Q?,...,Q. 
c£ ) =Ji ) M^( W ^)A i i /\ i = l,. 



it) 



)(*)V 



, n: 



Then we have 



i)){M~ x {w®)) = g(w {t) ,Q {t) ) 

= hp® 1 w® 1 Q®) 
>/i(sW,^' +1 ),qW) 

> 9 (^ (m) ,Q W ) 
> V(M _1 (^ (m) )) 

The choice of u>(* +1 ) leads to (|16p as follows. After simple algebra, the iteration 



(by (dD) 
(by (d) 
(see below) 
(by (USD, CUD) 

(by (HID, (HI). 



(t+i) 



n _A„„1-2A : 



En A 1 



z = 1,... ,n, 



where 



,(*) 



Since < A < 1, Jensen's inequality yields 



l-A 



a=l 



Ei ^Z>ui? 



i=i 



l-A 



vi=l 



E^) ^E^(5 



That is, 



Hence 



t=i 



-2A 



i=l 



vi=l 



vi=l 



(16) 
(17) 
(18) 

becomes 



=1 u* 



> 



E 

i=l 



It)} ; 



(*)" 



which produces (fT6|) . Choosing A = 1/2, i.e., w 4 - t+1 ^ oc y^, leads to exact minimization in (fT6|) ; 
choosing A = 1 yields equality in (fT6|) . But any choice of u)(' +1 ) that decreases /t(SW, w, QW) 
at (I16p would have resulted in the desired inequality 

VKM- 1 ^)) > r/>(M -1 (w(* +1 ))). 
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We may allow A to change from iteration to iteration, and monotonicity still holds, as long as 
A G (0, 1]. See Silvey et al. (1978) and Fellman (1989) for investigations concerning the choice 
of A. Also note that we assume wf\wf~^ > for all i. This is not essential, however, because 
(i) the possibility of wf = can be handled by restricting our analysis to all design points i 
such that > 0, and (ii) the possibility of wf = can be handled by a standard limiting 
argument. Monotonicity holds as long as M(w^) and M(w^ t+1 ') are both positive definite, as 
noted in the statement of Theorem [TJ 



4 Global convergence 

Monotonicity (Theorem [1]) plays an important role in the following convergence theorem. 

Theorem 2 (Global convergence). Denote the mapping |IJ) by T. 

(a) Assume 

<t>'{M(w)) > 0; 4>'{M[w))Ai / 0, w G i = 1, . . . ,n. 

(b) Assume |I)) is strictly monotonic, i.e., 

wen + ,Tw^w <p(M(Tw)) > <p(M(w)). (19) 

(c) Assume <f> is strictly concave and <j)' is continuous on positive definite matrices. 

(d) Assume that, if M (a positive definite matrix) tends to M* such that 4>(M) increases 
monotonically, then M* is nonsingular. 

Let u/*) be generated by (0) with wf* > for all i. Then 

(i) all limit points of are global maxima of cj)(M(w)) on f2 +) and 

(ii) as t — >• oo, 4>{M{w^)) increases monotonically to sup^ Gr2+ 4>(M(w)). 

The proof of Theorem [2] is somewhat subtle. Standard arguments show that all limit points 
of ii/') are fixed points of the mapping T. This alone does not imply convergence to a global 
maximum, however, because there often exist sub-optimal fixed points on the boundary of £1. 
(Global maxima occur routinely on the boundary also.) Our goal is therefore to rule out possible 
convergence to such sub-optimal points; details of the proof are presented in the Appendix. We 
shall comment on Conditions (a)-(d). 

Condition (a) ensures that starting with w^ ' G f2+, all iterations are well-defined. Moreover, 
if wf 1 ^ > for all i, then wjp > for all t and i. This highlights the basic idea that, in order 
to converge to a global maximum w*, the starting value must assign positive weight to 
every support point of w*. Such a requirement is not necessary for monotonicity. On the other 
hand, assigning weight to non-supporting points of w* tends to slow the algorithm down. Hence 
methods that quickly eliminate non-optimal support points are valuable (Harman and Pronzato, 
2007). 

Condition (b) simply says that unless w is a fixed point, the mapping T should produce a 
better solution. Let us assume ©, ((7|) and Condition (a), so that Theorem Q] applies. Then, 
by checking the equality condition in (|16p . it is easy to see that Condition (b) is satisfied if 
< A < 1. (The argument leading to (|19p technically assumes that all coordinates of w are 
nonzero, but we can apply it to the appropriate subvector of w.) If A = 1, then (fT6|) reduces to 
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an equality. However, by checking the equality conditions in (|17p and (|18p . we can show that 
Condition (b) is satisfied if ip is strictly increasing and strictly concave: 

M 2 > Mi > 0, Mi ^ M 2 =>- ^(Afi) < ^(M 2 ); (20) 
Mi, M 2 > 0, Mi / M 2 => ip(M 2 ) < ij}(Mi) + tr(ip'{Mi)(M 2 - Mi)). (21) 

Conditions (c) and (d) are technical requirements that concern <p alone. Condition (c) ensures 
uniqueness of the optimal moment matrix, which simplifies the analysis. Condition (d) ensures 
that positive definiteness of M{w) is maintained in the limit. Conditions (c) and (d) are satisfied 
by <f> = (f> p with p < 0, for example. 

Let us mention a typical example of Theorem El 

Corollary 1. Assume Ai ^ 0, wf 1 ^ > 0, i = 1, ... ,n, and M(w^) > 0. Then the conclusion 
of Theorem^ holds for Algorithm I with (j) = (j)Q. 

Proof. Conditions (a), (c) and (d) are readily verified. Condition (b) is satisfied by (|20p and 
(|2ip . The claim follows from Theorem [2J □ 

When (|20l) or (|2~T1) fails, and A = 1, it is often difficult to appeal to Theorem [2] because 
strict monotonicity (Condition (b)) may not hold. We illustrate this with an example where 
the monotonicity is not strict, and the algorithm does not converge (see Pronzato et al. 2000, 
Chapter 7; see also the remark in Case (iii) following Theorem[T]). Consider iteration ([9]) (A = 1) 
with n = m = 2 and design space X = {xi = (l,Zi) T , i = 1,2}, z\ = — z 2 = 1. It is easy to 
show that, for any = (wi,W2) £ ^, iteration Q maps to w^ t+1 ^ = (w2,wi). Thus, 
unless w\ = w<i = 1/2 to begin with, the algorithm alternates between two distinct points. This 
appears to be a rare example, as ((9j) usually converges in practical situations. 



5 Further remarks and illustrations 

One can think of several reasons for the wide interest in Algorithm I and its relatives. Similar 
to the EM algorithm, Algorithm I is simple, easy to implement, and monotonically convergent 
for a large class of optimality criteria (although this was not proved in the present generality). 
Algorithm I is known to be slow sometimes. But it serves as a foundation upon which more 
effective variants can be built (see, e.g., Harman and Pronzato 2007, and Dette et al. 2008). 
While solving the conjectured monotonicity of ([9]) holds mathematical interest, our main contri- 
bution is a way of interpreting such algorithms as optimization on augmented spaces. This opens 
up new possibilities in constructing algorithms with the same desirable monotonic convergence 
properties. 

As a numerical example, consider the logistic regression model 

exp(yx T 9) 

P{y\x,v) = — p^, y = 0, 1. 

1 + exp(x 1 6) 

The expected Fisher information for from a unit assigned to x\ is 



exp(xj9) 
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j 20 40 60 80 100 20 40 60 80 100 

iteration iteration 



Figure 1: Values of (fro = logdetM and 0_2 = —tr(M 2 ) for Algorithm I with design spaces X\ 
and X%. 

We compute locally optimal designs with prior guess 9* = (1, 1) T (m = 2), and design spaces 

X x = {x i = (l,i/20) T , i = 1,... ,20}; 
X 2 = {xi = (l,i/10) T , i = 1,... ,30}. 

The design criteria considered are (fro (for D-optimality) and (fr—2- We use Algorithm I with 
A = 1, starting with equally weighted designs. 

For (fro, Corollary [T] guarantees monotonic convergence. This is illustrated by Figured! the 
first row, where (fro = log det M(w) is plotted against iteration t. Using the convergence criterion 
@ with (5 = 0.0001, the number of iterations until convergence is 93 for X\ and 2121 for X2. The 
actual locally D-optimal designs are w\ = W20 = 0.5 for X\ and w\ = W23 = 0.5 for X2, as can 
be verified using the general equivalence theorem. This simple example serves to illustrate both 
the monotonicity of Algorithm I (when Theorem [T] applies) and its potential slow convergence. 

For (fr-2, although Algorithm I can be implemented just as easily, Theorem [1] does not apply, 
because the concavity condition ([7]) no longer holds. Indeed, Algorithm I (with A = 1) is not 
monotonic, as is evident from Figure [TJ the second row, where (fr-2 = —tr(M~ 2 (w)) is plotted 
against iteration t. This shows the potential danger of using Algorithm I when monotonicity is 
not guaranteed. 

Although Theorem Q] does not cover the (fr p criterion for p < —1, it is still possible that 
monotonicity holds for a smaller range of A. Calculations in special cases lead to the conjecture 
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(Silvey et al. 1978) that Algorithm I is monotonic if < A < 1/(1 — p). Theorem [T] provides 
further evidence for this conjecture, but new insights are needed to resolve it. 

We have focused on local optimality. An alternative, Bayesian optimality (Chaloner and 
Larntz, 1989; Chaloner and Verdinelli, 1995), seeks to maximize the expected value of <j)(M(9; w)) 
over a prior distribution tt(8). The notation M{6\ w) emphasizes the dependence of the moment 
matrix on the parameter 6. It would be worthwhile to extend our strategy in Section 3 to 
Bayesian optimality, and we plan to report both theoretical and empirical evaluations of such 
extensions in future works. 
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Appendix: Proof of Theorem [2] 

Lemma 1. Any limit point w* of w^ is a fixed point ofT, i.e., Tw* = w* . 

Proof. Let be a subsequence converging to w* . By Conditions (b) and (d), M(w*) is 

positive definite, i.e., w* £ Q+. Moreover, since both T and the function 0(M(-)) are continuous 
on J7 + , we have 

cp(M(w*)) = lim (j)(M(w {t ^)) = lim 4>{M(w^ +1) )) = (p(M(Tw*)), 

where the two limits are equal by monotonicity. Prom Condition (b) we deduce that Tw* = 
w*. □ 

Lemma 2. Suppose w* is a limit point ofw^\ and define S+ = {w G f2+ : w* = =^ Wi = 0}, 
i.e., S + collects all w that are absolutely continuous with respect to w* and satisfy M(w) > 0. 
Then we have 

(j){M{w*)) = sup 0(M(u>)); (22) 
wes+ 

w £ S+, <t>(M(w)) = sup <p{M{w)) =>- M(w) = M(w*). (23) 
wes+ 

Proof. By Lemma [H w* is a fixed point of T. That is, 

w * ^ tr((j)'(M(w*))A i ) = tr((j)'(M(w*))M(w*)). 

By the general equivalence theorem, w* maximizes <p(M(w)) on S+, and (|22p is proved. The 
implication (123j) holds because we assume cft(-) is strictly concave (Condition (c)). □ 

Lemma 3. The sequence M(w®) has finitely many limit points. 

Proof. Since M(-) is continuous, any limit point of M(w^) is of the form M(w*) for some limit 
point w* of w^\ By LemmaEl M(w*) is the unique maximizer of 4>{M) among all M > that 
can be written as M = M{w) with w <C w* . Depending on which coordinates of w* are zero, 
there are fewer than 2™ such "degenerate maximizers" . □ 
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Lemma 4. The limit = lim^oo M(w^>) exists. 

Proof. Assume M{w^') has L < oo limit points, and let Bi, i = 1,...,L, be non-intersecting 
balls (neighborhoods) centered on these. We know L > 1 because M(w^) is bounded; the 
choice of a metric is immaterial. Again by boundedness, for large enough i, each M(w^) 
belongs to exactly one of B^. Assume L > 2, i.e., M(w^) does not converge. Then there 
exists a subsequence such that M(u/*-?)), M(u/^ +1 )) always belong to different B{. By passing 
through a sub-subsequence if necessary, we may assume — > w*; by Lemma[H icfe +1 ) — y w* . 
It follows that M(u)fe)) — > M(w*) and M(u/*-> +1 )) — > M(w*), which contradicts the assumption 
of distinct neighborhoods. □ 

Lemma 5. The limit as defined in Lemma^ satisfies 

0(Moo) = sup <j>{M{w)). 

Proof. Let us check the conditions of the general equivalence theorem, i.e., 

tr^'iM^Ai) < tr^MxOMoo), i = l,...,n. 

Suppose this fails for i = 1, say, then by Lemma S] there exists 5 > such that for sufficiently 
large t, we have 

tr U'(M(w®))aA > (1 + <5)tr (V (M(wW))M(u;W)) . 
It follows from the definition of T that 

^>(H*)'S^, (24) 

where = tr (^/(jV/^u/*) ))>!$). However, the right hand side of (|24j) is at least (1 + <5) A due to 
Jensen's inequality. That is, 

> (1 + 6) w\ 

for all f large enough, which contradicts the obvious constraint < < 1. □ 
Theorem [2] then follows from Lemma [5j 
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